14  Climate change: Impacts

Learning Objectives

After completing this tutorial you should be able to

You should have already downloaded the directory for this project here. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.

There should be a file named 14_impacts.qmd in that project directory. Use that file to work through this chapter - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.

Before you get started, let’s make sure to read in all the packages we will need.

# load libraries
library(knitr)

library(ggplot2)
library(ggthemes)
library(patchwork)

library(dplyr)
library(tidyr)
library(readr)
library(magrittr)

# turn of sci notation
options(scipen=999)

14.1 A grammar of graphics

Now that we are confident(ish) in our data wrangling the next step is learning how we can visualize our data for exploration and for communication. We have already used figures using custom functions implemented in DADA2 that are based on the grammar of graphics/wrap functions from ggplot2 and we also familiarized ourselves with the syntax for using ggplot2 in the previous chapter.

Being able to visualize your data is an important skill throughout the process of data science. For example, in our next module we will dig into how to complete an exploratory analysis which is an important component of data science as it not only gives you an overview of what information is contained in a data set and can also help you refine the question you are asking. Finally, once you have refined and completed an analysis, visualizations are a key component to communicate your results.

ggplot2 is the core package of the tidyverse used for visualization. Similar to tidyr and dplyr having been built on the concept of a tidy data set, ggplot2 was built on a framework which follows a layered approach to describe and/or build any type of visualization in a structured way termed the “grammar of graphics”.

The grammar of graphics breaks down the components of any visualization into seven components.

  • Data is the component of data set to visualize1.
  • Mapping or Aesthetics include variables to be plotted, plot positions (e.g. x or y axis), and encodings such as size, shape, color2.
  • Geometric objects (or layers) are what you actually end up seeing in the visualization, e.g. points, lines, bars etc. to represent data. This can also include statistical transformations3.
  • Scales map the values in the aesthetic space you specified, this includes your scales for color, shape, and size, you can also explicitly specify scales for the x and y axis.
  • Coordinate system, includes all the usual suspects such as cartesian or polar along with some you may have never really heard of.
  • Facets, i.e. subplots based on multiple dimensions.
  • Statistics, including confidence intervals, means, quantiles, may additionally be added to the plot.
  • 1 You will see that this is either explicitly specified as the data argument (ggplot(data = df)), or you can just specify the data.frame or tibble to plot without typing in data = as ggplot(df).

  • 2 These components are specified as the mapping argument using aes(). Again you can explicitly call the argument as mapping =, i.e. ggplot(data = df, mapping = aes()), or you can just specify the mapping aesthetics using aes() because you are using the arguments in the correct sequence (ggplot(df, aes())) in which case R assumes they apply to these various arguments.

  • 3 Functions for these geometric objects will all start with geom_* or stat_*

  • Additionally, you can create and use themes that control additional aspects of your your figure is displayed including font size, background color, etc.4.

  • 4 The package ggthemes includes various themes but you will also find that you can create your custom themes in a pretty straightforward way and that many R users have creted themes that they have made accessible for other users.

  • Until now you’ve probably thought of each plot type as it’s own distinctive format (a scatter plot, a pie chart, etc) and have not considered that each plot can be broken down into these fundamental components, so at first this way of thinking of plots will likely not seem intuitive.

    But if you commit to thinking about plots in this abstract way you will quickly learn how these components fit together and realize the flexibility that this framework will give you to quickly generate exploratory plots, optimize your visualizations and be able to generate pretty much any plot you can think up5.

  • 5 You will also see that keeping your data tidy will allow you to customize your plots using different aesthetics and mappings to group your data.

  • 14.2 Building plots

    Y’all ready for this?

    Then, let’s go back to our global mean temperature anomaly data set to figure out how the grammar of graphics can be applied and plot some data!

    temperature <- read_delim("data/GLB.Ts+dSST.csv", delim = ",", skip = 1) %>%
      replace(. == "***", NA) %>%
      mutate_if(is.character, as.numeric)

    Recall how we created a line plot:

    ggplot(temperature, aes(x = Year, y = `J-D`)) +
      geom_line()

    We can generalize this into a basic template that can be used for different kinds of plots, including scatter plots, bar plots, and box plots.

    
    ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) +
      <GEOM_FUNCTION>
    

    Essentially, the minimum components needed to make a plot are

    1. A data set (<DATA>), this will be a data.frame or a tibble6.
    2. Based on the data, we need to define what variables we are going to plot (<MAPPINGS>) and how we want them to be represented, i.e. what individual data points should look like and how they are encoded.
    3. We need to define the geometries, i.e. what type of plot (<GEOM_FUNCTION).
  • 6 Remember, when we use read_delim() our data is automatically a object both of the format tibble and data.frame; while a tibble has some additional properties but we can use them interchangeably

  • Let’s break down what is happening to get a better understanding of how these minimum components fit together to create individual plots.

    First, we use the ggplot() function to define the specific data.frame to use to build the plot using the data argument (<DATA>)7.

  • 7 Remember, that when we are using arguments in the specified sequence that they are defined you do not explicitly need to call them.

  • In our example that would be

    ggplot(data = temperature)

    Figure 14.1: Empty canvas generated using ggplot() function.

    Not much happens when you execute that piece of code other than creating a blank canvas in your plot panel8.

  • 8 Remember, if you are using and Rmarkdown or quarto document and have Chunk Output Inline enabled it will print beneath your code chunk

  • That is because we still need to define an aesthetic mapping using the aes() function (<MAPPING>).

    Here, we are selecting the variables that we want to plot (columns in our data.frame). At minimum we need to specify what columns we want to plot on the x and y axis, but we can also define variables (columns) we want to use to encode using color, shapes, size etc.

    ggplot(data = temperature, aes(x = Year, y = `J-D`))

    Figure 14.2: Coordinate system plotted after specifying mapping aesthetics, specifically which variables to plot on each axis.

    When we execute that code you will now get an empty coordinate system. This still seems like slim pickings but patience, young grasshopper, we have not yet defined the geom, i.e. what type of plot (“geometries”, <GEOM_FUNCTION>) we want to use. For example, here we want generate a line plot (geom_line()).

    For ggplot we add layers using the + operator, which is a pipe (similar to %>%) which tells R “and now add this”.

    ggplot(data = temperature, aes(x = Year, y = `J-D`)) +
      geom_line()

    Figure 14.3: Line plot (geom_line()) showing global mean temperature anomalies relative to 1951-1980 average.

    Now, we’re playing!

    One of the advantages of the layered framework of ggplot2 is that we can plot multiple layers in the same plot by adding additional geoms.

    For example, we can plot a scatterplot and line plot in the same plot as such:

    ggplot(data = temperature, aes(x = Year, y = `J-D`)) +
      geom_line() +
      geom_point()

    Figure 14.4: Line plot showing global mean temperature anomalies relative to 1951-1980 average. Individual data points added using geom_point().

    Now that you know how to generate a simple plot, let’s think about how we can further modify it to improve your visualization and how well you can communicate your results.

    For example, we can change the color, fill, size, and shape for each geom layer.

    In this example we are using the arguments for individual geom functions to change how data points are represented using colors and shapes for each layer, we can also use aes() to set mapping aesthetics for the entire plots.

    ggplot(data = temperature, aes(x = Year, y = `J-D`)) +
      geom_line(color = "darkblue", size = 1) +
      geom_point(shape = 21, color = "darkblue", fill = "white", size = 3)

    Figure 14.5: Line plot showing global mean temperature anomalies relative to 1951-1980 average. Individual data points added using geom_point(), color and fill for lines and shapes have been modified.

    Shapes for points can be specified using numbers; note how some shapes are “solid”, i.e. specifying color will define the color of the shape. Others, like the one used in our example are filled, i.e. color will determine the color of the outline, and fill the color of the space inside. For “hollow” shapes (or e.g. X’s) color will determine the line color.

    Numeric codes for sympols.

    You can specify colors either using the color names defined by R or using hex codes.

    Some exploring will also lead you to various color pallets, many of them put together by R users around the world. Wes Anderson fan? You can style your plots accordingly… dig the aesthetics of the old school National Park posters and images? R community got you covered..

    Previously, we also used geom_smooth() to add a layer with a linear regression. If you take a look at the arguments for this function using ?geom_smooth you will see that this has additional arguments apart from the mapping aesthetics like choosing the type of regression, whether or not the confidence interval is shown etc. Let’s say we want to add a red, dashed regression line without the confidence interval.

    ggplot(data = temperature, aes(x = Year, y = `J-D`)) +
      geom_line(color = "darkblue", size = 1) +
      geom_point(shape = 21, color = "darkblue", fill = "white", size = 3) +
      geom_smooth(stat = "smooth", se = FALSE, color = "red", linetype = "dashed", size = 2)

    Figure 14.6: Line plot showing global mean temperature anomalies relative to 1951-1980 average. Individual data points added using geom_point(), color and fill for lines and shapes have been modified. Trend line fitted using geom_smooth().

    Admittedly, this plot is now starting to look a little bit ridiculous and is probably make it less, not more easy for the viewer to undestand what they can learn from the data. Maybe we should think of alternative options for visualizing this data to explore some additional ggplot options.

    Another way to visualize this data would be using a bar plot.

    Can we appreciate for a second how easy it was to change how the same data is plotted? If you were using excel you would have had to insert a new plot, define what data you wanted to plot, relabel the axis, etc. Here all you had to do was change a single line of code.

    ggplot(data = temperature, aes(x = Year, y = `J-D`)) +
      geom_bar(stat = "identity")

    Figure 14.7: Bar plot showing global mean temperature anomalies relative to 1951-1980 average.

    Notice, how this is still the exact same mapping aesthetics (time/year on the x-axis and the temperature anomaly on the y-axis) but it changes the way we see the data completely - the emphasis now is on the change of temperatures below or above the long-term mean and the clear pattern of a shift of those temperature differences going from negative to being positive.

    Let’s say we want really lean into demonstrating this drastic change in temperatures not only being above the long-term average but also steadily increasing. One way to do this is to use color. We could modify this bar plot to have all of our bars representing years with global mean temperatures below the 1951-1980 average in one color and those above in another.

    To do this, we need a column that encodes that information… that of course is not an issue for us as pretty much professional data wranglers! We can use a simple conditional mutate using an ifelse()9 statement to add a column (year_type) that indicates whether temperatures are above or below the the long-term average.

  • 9 recall, that we previously used case_when() for a conditional mutate, when the condtion is binary (true/false or this/that) using ifelse() makes for a much shoerte syntax.

  • We can manipulate our data.frame using the dplyr functions we are already familiar with. We can even use the %>% pipe to pass the data argument directly to the ggplot() function10

  • 10 When you do this, remember that you need to switch back to using the + operator once you are adding your geom layers.

  • temperature %>%
      mutate(year_type = ifelse(`J-D` <=0, "colder", "hotter")) %>%
      ggplot(aes(x = Year, y = `J-D`, fill = year_type, color = year_type)) +
      geom_bar(stat = "identity")

    Figure 14.8: Bar plot showing global mean temperature anomalies relative to 1951-1980 average. Mapping aesthetics have been specified to color code according to variable included in the data set.

    Anything that we add to the ggplot() layer as a mapping is a universal plot setting that will apply to all subsequent layers. You can override these global settings is you specify aesthetics like color, shape or size specifically in an separate layer. For example if we do the following, we will override our color coding even though we specified it in our ggplot() layer.

    temperature %>%
      mutate(year_type = ifelse(`J-D` <=0, "colder", "hotter")) %>%
      ggplot(aes(x = Year, y = `J-D`, fill = year_type, color = year_type)) +
      geom_bar(stat = "identity", color = "blue", fill = "blue")

    Figure 14.9: Bar plot showing global mean temperature anomalies relative to 1951-1980 average. Mapping aesthetics have been specified to color code according to variable included in the data set. Settings in geom_bar() override the general plot specifications.

    Let’s see what else we can visualize using ggplot. Recall from when you explored our temperature anomaly data set, that this data set also contains information for seasons and individual months.

    Let’s take a look at the distribution of mean global winter temperatures (DJF = December, January, February) using a histogram, (geom_histogram()). Note that we only need to define our x-axis to plot a histogram, this is because ggplot2 will count how many observations fall into each bin for you … this geom is a mixture of a geometry (bar plot) and a statistical transformation (binning data points into ranges and the plotting those as bar plots).].

    ggplot(temperature, aes(x = DJF)) +
      geom_histogram()

    Figure 14.10: Histogram summarizing the distribution of mean global winter temperatures.

    We can refine our plot by choosing our own bin width and by manipulating fill and color.

    ggplot(temperature, aes(x = SON)) +
      geom_histogram(binwidth = 0.05, color = "black", fill = "darkorange")

    Figure 14.11: Histogram displaying distribution of mean global winter temperatures with customized bin width and coloration.

    For our example we might also want to add a vertical line (geom_vline()) to indicate 0 (i.e. the long-term global temperature mean calculated for 1951 - 1980).

    ggplot(temperature, aes(x = SON)) +
      geom_histogram(binwidth = 0.05, color = "black", fill = "darkorange") +
      geom_vline(xintercept = 0, color = "darkred", linetype = "dashed", size = 1)

    Figure 14.12: Histogram displaying distribution of mean global winter temperatures with customized binwidth and coloration and added vertical line (red).

    If we want to compare multiple distributions, box plots can be more helpful than histograms. Let’s say we wanted to compare the distribution of mean global temperature for each season.

    How can we go about this?

    Currently, our seasons are in individual columns, so our first step would be to create a tidy data set.

    tidy_season <- temperature %>%
      select(Year, DJF, MAM, JJA, SON) %>%
      pivot_longer(names_to = "season", values_to = "temperature", 2:5)

    Now we can plot our seasons on the x-axis and the distribution of temperatures on the y-axis using geom_boxplot().

    ggplot(tidy_season, aes(x = season, y = temperature)) +
      geom_boxplot()

    Figure 14.13: Box plot comparing global mean temperature anomalies relative to 1951-1980 average for all four seasons.

    Again, you can add additional information using the mapping aesthetics to e.g. color code the boxes by season.

    ggplot(tidy_season, aes(x = season, y = temperature, fill = season)) +
      geom_boxplot()

    Figure 14.14: Box plot comparing global mean temperature anomalies relative to 1951-1980 average for all four seasons color-coded by season.

    14.3 Faceting plots

    One of the advantages of ggplot2 being based not only on the grammar of graphics but also around the concept of a tidy data set is being able to create faceted plots. A faceted plot involves splitting a single plot into a matrix of panels, where each panel shows a different subset of the data. This is especially helpful during exploratory analysis where you might first plot all your data points in a single graph but then want to look at whether or not individual subsets within the data set behave the same.

    Let’s look at an example to better understand what faceting plots looks like. For example, let’s say we wanted to create individual plots of our bar plots showing our deviations of global temperatures from the 1951 - 1980 mean for each season.

    How could you create individual plots with the methods you are already familiar with?

    Here is how you can do it using facet_grid().

    Now we can plot our data and using facet_grid() we can specify that we want individual panels by month in separate rows.

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(rows = vars(season))

    Figure 14.15: Change in temperature anomaly relative to long-term avarage per season (rows)

    You could also specify that you want the individual plots to be separated into columns.

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(cols = vars(season))

    Figure 14.16: Change in temperature anomaly relative to long-term avarage per season (columns)

    You can also create faceted plot where subset your data by two variables and so you end up with one variable defining the rows and one the columns and you can use a simpler syntax row-variable ~ column-variable.

    You can also use this syntax if you are only faceting by one variable as we are doing in this example by specifying the variable and leaving the other one “blank” using a ..

    For example, to plot this faceted data set in rows, you would use the following syntax -

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(season ~ .)

    Figure 14.17: Change in temperature anomaly relative to long-term avarage per season (rows)

    And to plot this data set column-wise you would specify it like so -

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(. ~ season)

    Figure 14.18: Change in temperature anomaly relative to long-term avarage per season (columns)

    14.4 Customizing plots

    We have already played around a little bit with the options that we have in terms of customizing plots using color, fill, shapes, and sizes. But we’ve barely scratched the surface.

    You have probably noticed that there are some default settings like the gray background and white grid lines, font sizes of label axis, what the axis labels are, and even color schemes that are automatically used. Even using the defaults we get pretty clean plots that are visually appealing. This is super helpful during exploratory analysis because even though you playing around with the data you still have nicely formatted and easy to interpret figures.

    Once you have identified the central plots that you want to use to communicate the results and conclusions of your data analysis you will want to further customize your visualization to optimize communication, this includes how you encode data using color and shape but also making sure that everything is well labeled and clear to the person who is reading your report or listening to your presentation.

    One of the first things we frequently want to changes is the axis labels. ggplot2 does handily use the column names to automatically label your axes, so you will always have a label which is great during exploratory analysis but generally you will want to customize that for your final figure. The function labs() can be used to specify a title, subtitle, axis labels and additional annotations (caption) below the figure.

    For example we could customize our faceted figure to look like this:

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(rows = vars(season)) +
      labs(title = "Change in global seasonal temperatures 1880 - 2020",
           subtitle = "Global mean temperatures relative to 1951 - 1980 mean",
           x = "Year", y = "Temperature [C]",
           caption = "Date source: NASA Goddard Institute for Space Studies")

    Figure 14.19: Change in temperature anomaly relative to long-term avarage per season (rows)

    Pretty much every component of a ggplot figure can be further customized using theme(), this includes things like font size, background and line colors, grids, legend position … ggplot2 odes have some pre-defined themes that you can call up that will change the layout.

    The default theme is theme_grey().

    ggplot(tidy_season, aes(x = Year, y = temperature)) +
      geom_bar(stat = "identity", fill = "darkorange", color = "darkorange") +
      facet_grid(rows = vars(season)) +
      labs(title = "Change in global seasonal temperatures 1880 - 2020",
           subtitle = "Global mean temperatures relative to 1951 - 1980 mean",
           x = "Year", y = "Temperature [C]",
           caption = "Date source: NASA Goddard Institute for Space Studies") +
      theme_grey()

    Figure 14.20: Change in temperature anomaly relative to long-term avarage per season (rows)

    Here, we explicitly specified it using theme_grey(); if you do not specify a theme this is the theme that will be used for your plot. There are other themes that are part of the ggplot package that include theme_bw(), theme_minimal(), theme_classic() or theme_light().

    Give it a whirl

    Re-plot the same figure using the four themes specified above.

    Did it!

    [Your answer here]

    You will likely find a theme that you like but still want or need to make additional tweaks. In that case the function theme() is your friend.

    Let’s go back to our bar plot of the global temperature anomaly to look at an example of likely the three most common things you will want to adjust which is the legend position, changing font size, color, turning x-axis labels by 90 degrees.

    temperature %>%
      mutate(year_type = ifelse(`J-D` <=0, "colder", "hotter")) %>%
      ggplot(aes(x = Year, y = `J-D`, fill = year_type, color = year_type)) +
      geom_bar(stat = "identity") +
      labs(title = "Change in global mean temperatures 1880 - 2020",
           subtitle = "Global mean temperatures relative to 1951 - 1980 mean",
           x = "Year", y = "Temperature [C]",
           caption = "Date source: NASA Goddard Institute for Space Studies") +
      theme_classic() +
      theme(legend.position = "bottom",
            axis.text.x = element_text(size = 12, color = "black",
                                       angle = 90, hjust = 0.5, vjust = 0.5),
            axis.text.y = element_text(size = 12, color = "black",),
            axis.title = element_text(size = 14))

    Figure 14.21: Change in mean global temperature relative to long-term avaerage with customized font sizes and legend positions.

    That’s starting to look pretty slick.

    As you start your visualization adventure the R Cookbook is a good resource. It is written in a helpful style that starts with a concrete plotting problem and then walks you through a solution.

    14.5 Arranging plots

    We’ve already seen that we can create multi-panel plots in a very straightforward way when we are plotting the same plot for different subsets of the data (facetted plots).

    But what if we want to generate individual plots that might fit together thematically but aren’t subsets that we can facet but we still want to be able to present them together? Fear not, this too can be solved; one option is using patchwork, a package designed for exactly this purpose. Let’s generate a few additional plots so we can try this out.

    In our data folder there is a tab-delimited file with monthly mean CO2 concentrations (parts per million) from Maunua Loa. Let’s read in that data set and create line plot with a regression line.

    carbon <- read_delim("data/CO2_monthly.txt", delim = "\t")
    
    ggplot(carbon, aes(x = date, y = average)) +
      geom_line(color = "blue") +
      geom_smooth(color = "red") +
      labs(title = "Atmospheric CO2",
           subtitle = "monthly mean CO2 Mauna Loa CO2",
           x = "year", y = "CO2 concentration air [ppm]",
           caption = "Data: NOAA/ESRL") +
      theme_classic()

    Figure 14.22: Mean monthly atmospheric CO2 concentrations measuread at Mauna Loa Observatory from 1958 - 2021.

    We also have a data set from the Global Carbon Project downloaded from the Our World in Data repository that contains atmospheric CO2 emissions. We can plot that as a simple line plot with a regression.

    emissions <- read_delim("data/emissions.txt", delim = "\t") %>%
      filter(iso_code == "OWID_WRL")
    
    ggplot(emissions, aes(x = year, y = co2)) +
      geom_line(color = "blue", size = 1) +
      geom_smooth(color = "red") +
      labs(title = "CO2 Emissions over time.",
           subtitle = "Global emissions atmospheric emissions by year.",
           x = "year", y = "CO2 [Gt/year]",
           caption = "Data: Global Carbon Project/Our World in Data") +
      theme_classic()

    Figure 14.23: Global atmospheric CO2 emissions.

    To be able to plot multiple plots in one using patchwork we need to assign our figures as objects.

    p1 <- ggplot(carbon, aes(x = date, y = average)) +
      geom_line(color = "blue") +
      geom_smooth(color = "red") +
      labs(title = "Atmospheric CO2",
           subtitle = "monthly mean CO2 Mauna Loa CO2",
           x = "year", y = "CO2 concentration air [ppm]",
           caption = "Data: NOAA/ESRL") +
      theme_classic()
    
    p2 <- ggplot(emissions, aes(x = year, y = co2)) +
      geom_line(color = "blue", size = 1) +
      geom_smooth(color = "red") +
      labs(title = "Atmospheric CO2 Emissions",
           subtitle = "global emissions",
           x = "year", y = "CO2 [Gt/year]",
           caption = "Data: Global Carbon Project/Our World in Data") +
      theme_classic()

    Now we can combine them side by side using a simple syntax.

    p1 + p2

    Figure 14.24: Mean monthly atmospheric CO2 concentrations measuread at Mauna Loa Observatory (left panel) and Global atmospheric CO2 emissions (right panel).

    Similarly, we can plot them underneath each other like so -

    p1 / p2

    Figure 14.25: Mean monthly atmospheric CO2 concentrations measuread at Mauna Loa Observatory (top panel) and Global atmospheric CO2 emissions (bottom panel).

    Oh, it gets better. Let’s say we wanted to plot our global mean temperatures in the top row and our two emissions plots below.

    p3 <- temperature %>%
      mutate(year_type = ifelse(`J-D` <=0, "colder", "hotter")) %>%
      ggplot(aes(x = Year, y = `J-D`, fill = year_type, color = year_type)) +
      geom_bar(stat = "identity") +
      labs(title = "Change in global mean temperatures 1880 - 2020",
           subtitle = "Global mean temperatures relative to 1951 - 1980 mean",
           x = "Year", y = "Temperature [C]",
           caption = "Date source: NASA Goddard Institute for Space Studies") +
      theme_classic() +
      theme(legend.position = "bottom",
            axis.text.x = element_text(size = 12, color = "black",
                                       angle = 90, hjust = 0.5, vjust = 0.5),
            axis.text.y = element_text(size = 12, color = "black",),
            axis.title = element_text(size = 14))
    
    p3 / (p1 | p2)

    Figure 14.26: Comparison of change in mean global temperature (top panel), atmospheric CO2 concentrations (bottom left), and atmospheric CO2 emissions (bottom right).

    You can create complex compositions in patchwork using syntax combining +, |, and \. To see how you can control the layout even further you can check out the documentation for patchwork.

    14.6 Exporting Plots

    The last thing we still have to figure out is how to save plots to file. You have a few options.

    The quickest and dirtiest option is to simply right click on the plot pane after plotting a figure and then save the figure using Save image as. This works and is easy to do but gives you very little control over the dimensions, resolution, file format etc.

    For more control over the format of your figure you can use the Export tab in the Plot pane which will allow you to adjust the dimensions and the the file format.

    Finally, the ggplot2 has a function called ggsave() that will allow you to determine the dimensions (width, height), resolution (dpi), and format (device).

    By default it will save the last plot that was plotted. If you specify the format in the file name (e.g. *.svg, *.jpg, *.png) it will automatically recognize the format11.

  • 11 Remember that we have designed our research compendium to keep raw and processed data, and results separate? Figures are considered results so you should always save them to the results folder.

  • temperature %>%
      mutate(year_type = ifelse(`J-D` <=0, "colder", "hotter")) %>%
      ggplot(aes(x = Year, y = `J-D`, fill = year_type, color = year_type)) +
      geom_bar(stat = "identity") +
      labs(title = "Change in global mean temperatures 1880 - 2020",
           subtitle = "Global mean temperatures relative to 1951 - 1980 mean",
           x = "Year", y = "Temperature [C]",
           caption = "Date source: NASA Goddard Institute for Space Studies") +
      theme_classic() +
      theme(legend.position = "bottom",
            axis.text.x = element_text(size = 12, color = "black",
                                       angle = 90, hjust = 0.5, vjust = 0.5),
            axis.text.y = element_text(size = 12, color = "black",),
            axis.title = element_text(size = 14))
    
    ggsave("results/global_temp.png", dpi = 300, width = 15, height = 10)

    Figure 14.27: Change in mean global temperature realtive to long-term average.

    Check your results folder to see if you were successful!

    If you assign your plot to an object you can use the plot argument of the ggsave() function to export it. This also works for figures that consist of multiple panels combined using patchwork.

    14.7 More Plots!

    Ready to take the training wheels off?

    Increasing global temperatures are effecting change across the different components of the climate system. There are several data sets in your data folder. Read in each data set as a data frame, use View() to take a look at what information is contained in the data set and the use your new found visualization skills to plot the data using ggplot.

    We have looked at rising mean global temperatures and discovered that rapidly increasing emissions and atmospheric concentrations of CO2 are consistent with increased concentrations of greenhouse gases driving temperature change.

    The effects of global warming are observed throughout components of the climate system including the atmosphere, the hydrosphere (marine and freshwater systems), cryosphere (land and sea ice), and lithosphere (earth’s surface/crust). Let’s look at a few data sets that illustrate the impact of rising temperatures.

    For all the plots, comment each line of your ggplot code (you can add comments between the lines or at the end), remember to caption your figure, list your data source and and label your axis in a meaningful way.

    Give it a whirl

    Increasing atmospheric temperatures result in land ice melting and the ocean heat content rising. The latter leads to thermal expansion which together with the increase freshwater inflow results in rising sea levels.

    The first data set is from the NOAA Laboratory for Satellite Altimetry and contains records from coastal sea level tide gauges. Start by reading in the data set:

    sealevel <- read_delim("data/sealevel.txt", delim = "\t")
    1. Take a quick look at the data set using View() and briefly summarize what data is contained in this data set12
    2. Plot the change in mean sea level over time as a line plot and color code the line(s) by method (see column names). Chose a theme and position your legend below the figure.
    3. Figure out how to use geom_hline() to add a line at 0.
    4. Comment each line of your ggplot code (you can add comments between the lines or at the end), remember to caption your figure, list your data source and and label your axis in a meaningful way.
    5. Write a short description of your figure summarizing the key results displayed.
  • 12 Include things like the number of columns, what data is contained in each sample, what time periods are included etc. Practice deducing these types of things from the context of the data set description above and by browsing the content.

  • Did it!

    Data Set:

    Plot

    Summary of results

    Here’s an example of what tour final figure could look like.

    Give it a whirl

    Our next data set is from the National Snow and Ice Data Center and contains the Arctic (Northern Hemisphere) July sea ice extent.

    ice_arctic <- read_delim("data/arctic_ice.txt", delim = "\t")
    1. Take a quick look at the data set using View() and briefly summarize what data is contained in this data set. Be specific.
    2. Plot the change in ice extent over time as a line plot. Chose a theme. Add a linear regression and confidence interval.
    3. Extra Challenge: Change the default colors of the regression line and the confidence interval.
    4. Comment each line of your ggplot code (you can add comments between the lines or at the end), remember to caption your figure, list your data source and and label your axis in a meaningful way.
    5. Write a short description of your figure summarizing the key results displayed.
    Did it!

    Data Set:

    Plot

    Summary of results

    Give it a whirl

    Let’s take a look at the equivalent data set depicting sea ice extent in the Antarctic (Southern Hemisphere.)

    ice_arctic <- read_delim("data/antarctic_ice.txt", delim = "\t")
    1. Take a quick look at the data set using View() and briefly summarize what data is contained in this data set. Be specific.
    2. Plot the change in ice extent over time as a line plot. Chose a new theme from what you’ve been using previously. Add a linear regression and confidence interval.
    3. Extra Challenge: Change the default colors of the regression line and the confidence interval.
    4. Comment each line of your ggplot code (you can add comments between the lines or at the end), remember to caption your figure, list your data source and and label your axis in a meaningful way.
    5. Write a short description of your figure summarizing the key results displayed.
    Did it!

    Data Set:

    Plot

    Summary of results

    Give it a whirl

    One more data set. This one is from the NOAA Hurricane Research Division. While interpreting the observed patterns in sea level and ice extent and modeling future projections is pretty straightforward, understanding whether extreme events are getting “worse” is a little bit more complicated13.

    Note: For this coding challenge you will need to combine your data wrangling and visualization skills!

    hurricane <- read_delim("data/hurricanes.txt", delim = "\t")

    Take a quick look at the data set using View() and briefly summarize what data is contained in this data set. Be specific14.

    Plot 1:

    When thinking about hurricanes there are two metrics to consider when determining whether or not “hurricanes have gotten worse”. First, we can look at whether or not hurricanes have become more common by plotting the number of hurricanes for each year.

    1. Create a faceted plot with individual line plots showing the number of named storms (this would include not only hurricanes put also tropical storms), the number of hurricanes, the number of major hurricanes, and the number of hurricanes in the United States.
    2. Add a linear regression.
    3. Chose a custom theme.
    1. Extra Challenge: We’ve previously used facet_grid() to create faceted plots which requires us to specify how panels should be laid out in rows and/or columns. Figure out how to use facet_wrap() to just specify which variable to use to create individual panels with out them all needing to be laid out next to each other in columns or underneath each other in rows.

    Plot 2:

    The other way to evaluate whether or not “hurricanes have gotten worse” is to not look at the number of storms but to determine whether individual storms or storms as a whole have become more intense and/or destructive. One of the columns you may not have been able to fully figure out what information it contains is the column containing information on the “accumulated cyclone energy”15.

    1. Create a bar plot showing the ACE for each year.
    2. Chose a theme and customize the color of your bars16 3. Extra Challenge: Use geom_hline() to add a horizontal line indicating the mean ACE for the recorded time period.
    3. Write a short description for each figure summarizing the key results displayed and argue whether or not you think “hurricanes have gotten worse”.
  • 16 You can specify color and fill for bar plots. Try setting the color and fill to the same and to different colors to see what happens when you have a bar plot with this many individual bars.

  • 15 Now that you have this information gp ahead and update your notes to add it to your short data set description.

  • 14 Note, several of the columns start with ‘Revised’ this just has to do with the entire data set having revised (most recent) definitions applied for the metrics included. You can ignore that component.

  • 13 We will take a closer look at this aspect when we combine our data wrangling and visualization skills to learn how to perform an exploratory analysis

  • You will have to reformat the data set in order to plot it in this fashion. It can be helpful to first think about what your ggplot code should look like to create the plot described above and then format your data set to fit those needs. Recall that ggplot is designed not only around the grammar of graphics but also the principles of tidy data… so, do we currently have a tidy data set? Or what would it need to look like for this plotting challenge?.

    Did it!

    Data Set:

    Plot 1:

    Plot 2:

    Summary of results

    Here is what a draft of your first plot could look like before you finalize things like titles and themes:

    Here is what a draft of your first plot could look like before you finalize things like titles and themes:

    Give it a whirl

    Choose five of the plots we have generated in this chapter and combine them into a single plot using patchwork; include at least two columns and two rows in your layout17.

    Save your plot in a way that retains proper proportions to make it easy for somebody else to easily see the details and discern the general patterns. Share your multi-panel plot in the #assignment channel of our slack workspace.

    Use your the figure you have created to support argue that “the observed changes in the different components of the climate system are internally and physically consistent with increased global temperatures”. Draw from the descriptions you made for the figures throughout this challenges to summarize what you observe to support this claim. In your answer you will want to consider which of the data sets refer to which component of the earth-climate system and why the data sets you have looked at are consistent with increasing mean global temperatures.

    Extra challenge: You can use #| fig-height and #| fig-width; code chunk options to specify the dimensions of the output of a figure in the knitted document18. When you initially render your quarto document you will find that the default dimensions will not format your you multi-panel plot in a way that does not make it super legible. Manipulate these parameters and re-knit your html to display your figure in an appropriate way.

  • 18 Yes, this does mean that you will need to check the format of your html output and potentially change it before you submit. Checking your html report before you submit is a good habit to have just to make sure that everything is rendering as expected

  • 17 This can include your temperature anomaly data set as well.

  • Did it!

    Figure

    The observed changes in the different components of the climate system are internally and physically consistent with increased global temperatures

    Consider this

    Consider the following and write a short paragraph (7-10 sentences):

    • Think back over the analyses we have performed for this and the previous chapter and argue whether the analysis you have performed should be considered descriptive, exploratory, inferential/predictive, or causal/mechanistic.
    • Describe both what you actually did, and then also briefly sketch out what your experimental design/analysis would need to look like if you had performed analysis in the other categories.
    • Causal/mechanistic analyses are the most involved frequently also the most difficult to perform. Argue whether or not your results are less meaningful or informative if they are not mechanistic/causal or whether they are still useful to support the initial thesis that climate change is unprecedented, unequivocally anthropogenic, and the impacts are already impacting all components of the earth-climate system.
    Did it!

    [Your answer here]